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ABSTRACT 

Minority and majority groups were administered a 
special quantitative section of the Admission Test for Graduate Study 
in Business (ATGSB) under varying time conditions to determine if 
increasing the time allotted for the test would eliminate any bias 
which may exist due to an irrelevant speed factor. By a commonly 
employed definition the special section was found to be moderately 
speeded for all candidates under normal conditions. Neither the main 
effects due to time condition nor the interaction between the ethnic 
and time factor reached significant levels suggesting that increasing 
the time per item does not reduce any bias which may exist in the 
test. Although a substantial proportion of minority group scores fell 
at or below the chance level, these scores appeared to retain fairly 
high levels of reliability. (Author) 
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A STUDY OF TEST SPEEDEDNESS AS A POTENTIAL SOURCE OF BIAS IN THE ADMISSION 



TEST FOR GRADUATE STUDY IN BUSINESS QUANTITATIVE SCORE 
Franklin R. Evans and Richard R. Reilly 
Abstract 

Minority and majority groups were administered a special quantitative 
section of the Admission Test for Graduate Study in Business (ATGSB) under 
varying time conditions to determine if increasing the time allotted for the 
test would eliminate any bias which may exist due to an irrelevant speed factor. 
By a commonly employed definition the special section was found to be moder- 
ately speeded for all candidates under normal time conditions. Neither the 
main effects due to time condition nor the interaction between the ethnic and 
time factor reached significant levels suggesting that increasing the time 
per item does not reduce any bias which may exist in the test . Although a 
substantial proportion of minority group scores fell at or below the chance 
level, these scores appeared to retain fairly high levels of reliability. 
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A STUDY OF TEST SPEEDEDNESS AS A POTENTIAL SOURCE OF BIAS IN THE ADMISSION 
TEST FOR GRADUATE STUDY IN BUSINESS QUANTITATIVE SCORE 1 

Franklin R. Evans and Richard R. Reilly 
Educational Testing Service 

Standardized academic aptitude tests have been the subject of persistent 
criticism from members of certain minority groups who charge that such tests 
are unfair to members of their groups. Flaugher (1970) in a recent review of 
testing practices with respect to minority groups discussed three potential 
sources of unfairness which may be summarized as: (a) those having to do 

with test content, (b) the conditions or circumstances under which standardized 
tests are administered, and (c) the way in which test scores are actually 
used. Much recent research has centered on the third possible source of 
bias, with most researchers considering a test unbiased if the regression of 
the criterion scores on the test is the same for both groups. Thorndike (1971) has 
demonstrated that the use of a test that is unbiased by this definition will 
result in the screening out of a larger proportion of the minority group 
candidates than would be the case if the test were perfectly valid. Thorndike 
also pointed out that, "...one cannot appraise the ’fairness' of a test through 
its correlation with an 'unfair' criterion." 

In the absence of specific knowledge about how much bias, if any, 
exists in the criterion, the second possible source of bias mentioned above, 
the conditions under which tests are administered might be profitably investigated 
as possible biasing factors. One common complaint about test administration con- 
ditions m.fde by spokesmen from culturally disadvantaged groups is that standardized 
academic aptitude tests are too highly speeded. Almost all academic aptitude and 
achievement tests purport to be primarily measures of "power" and not speed: i.e., 
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most candidates are expected to have the opportunity to attempt all or almost 
all of the items in the allotted time. A recent study by Evans and Reilly 
(1972) investigated the effects of varying speededness of a reading comprehen- 
sion section of the Law School Admission Test (LSAT) on minority versus majority 
group performance. Although they report that lowering the degree of speededness 
in a reading comprehension section of the LSAT did not benefit one group more 
than the other (in terms of gain in mean score level), it was concluded that 
this section was more speeded for minority than for majority group candidates. 
The purpose of the present investigation was to extend these results to a test 
with a lower verbal "load." 

The Admission Test for Graduate Study in Business (ATGSB) is designed to 
yield a quantitative as well as a verbal score. The present study was conducted 
with the intent of determining: (1) if the quantitative section of the ATGSB 

is more speeded for Black examinees than for White, and (2) if reducing or in- 
creasing the degree of speededness has a differential effect on the scores of 
minority versus majority group members. 

Procedure 



Subj ects 

The ATGSB is a nationally administered test, and data for the present 
study were collected from a regularly scheduled ATGSB administration in 1971. 

In addition, data were collected from candidates taking the ATGSB at 26 centers 
in predominantly Black colleges in the Southeastern United States where no test 
fee was charged. A special research section was included with the five opera- 
tional sections of the ATGSB, and three different versions of this special sec- 
tion were administered by spiralling forms so that roughly one-third of the candi- 
dates in the study took each of the special forms . 
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Regular center Black candidates (RCB) and regular center White candidates 
(RCW) were identified within the national sample by means of a set of background 
questions. This same set of questions was administered at the special centers, 
but since the overwhelming majority of candidates at those centers was Black, 
only one special group, special center Black (SCB) , was identified for pur- 
poses of the study. 

Experimental Subtests 

In order to investigate the research questions posed above it was necessary 
to create experimental forms of a test which differed only in the degree to 
which they were speeded. The most obvious way to accomplish this would be to 
administer identical tests under different time conditions. Because of the con- 
straints imposed by the standardization necessary for a national testing program, 
however, it was necessary to find an alternative method of varying speed. In 
this study, three different speed conditions were created by varying the number 
of five-choice mathematics items within the special section while holding the 
time limit constant. Although all candidates took the special section under the 
40-minute time limit, Form A of the special section had 25 items; Form B, 30; 
and Form C, 35. Forms B and C were among a set of 25 items common to all forms. 
Scores on these 25 items served as the dependent variable for most of the analyses 
in the present study. Table 1 shows the order of the items in the three special 
sections. The net effect of this was that rather than the absolute time allowed 

Insert Table 1 about here 

being varied, the amount of time per item was changed. Under normal circumstances 
the average time per item on the ATGSB quantitative section is about 82 seconds. 
This may be compared with the averages of 96 seconds for Form A, 80 seconds for 
Form B, .and 69 seconds for Form C. Thus, the time conditions under which the 
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special section was administered included one "normal" condition (Form B) , 
one "speeded" condition (Form C) , and one ''unspeeded" condition (Form A) . 

Study Design 

Because the candidates taking the ATGSB at special centers were not typi- 
cal of candidates in general, it was decided to keep the RCB and SCB groups 
separate for purposes of analysis. Thus, the study design was a 3 x 3 in which 
specific attention was focused on the interaction between time condition and 
ethnic/center group. That is, there was less interest in the main effects due 
to test forms and less still in the main effects due to the group factor than 
there was in the possible differential effects due to speededness among the 
three groups. 

Results and Discussion 

Insert Figures la, lb, and lc about here 

The first question this study attempted to answer was whether the ATGSB 

quantitative test is more speeded for Blacks than for Whites. According to 

criteria used by Swineford (1956) a test may be considered unspeeded if (1) 

virtually all candidates reach 75 per cent of the items and (2) at least 80 

per cent of the candidates respond to the last item. As Figures la,b, and c 

show the special section appeared to be a speeded test under all three time 

2 

conditions for all three groups. The items in the special section were both 
ordered in difficulty (from easy to difficult) and corrected for guessing, 
which means that many individuals may not have attempted the last few items, 
not because they ran out of time, but because the items were simply too 
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difficult and they chose not to guess. It is clear from examining the figures 
that a substantial proportion of dropouts (20 per cent) occurred, in general 
somewhat earlier for the Black group than for the White group. The differ- 
ences observed in dropout rate in this study were not nearly as striking as 
those reported in an earlier study of a test in which the items were neither 
ordered in difficulty nor corrected for guessing (Evans & Reilly, 1972) . It 
is also worth noting that unlike that study, no clear relationship among the 
mean scores of the various groups and the rate at which members of these 

3 

groups did not complete the test was found. (Table 2 shows mean score levels 
for the three different groups taking each test form.) In fact, under the 

Insert Table 2 about here 




more speeded conditions (Eorms B and C) the group with the lowest mean scores 
(SCB) actually exhibited the lowest dropout rate. 

The next question the study attempted to answer was whether raising or low- 
ering the degree of speededness of the quantitative section of the ATGSB would 
differentially affect scores among the three ethnic/center groups identified, 
and, in particular, whether increasing the amount of time per item would more 
greatly benefit the minority groups. One possible approach would have been to 
examine the interaction effect between the time condition and group factors in a 
3 factorial analysis of variance, but the authors chose to analyze the data in 
a slightly different way in an attempt to reduce some of the error due to a less 
than optimum procedure for assigning individuals to treatment (time condition) . 
It was clear that scores on a regular quantitative section of the ATGSB would be 
highly correlated with scores on the special section, but it was equally clear 
that some of the necessary assumptions for using that score as a covariate in an 
analysis of covariance could not be met. Although individuals were assigned 
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approximately randomly within ethnic/center group, there were rather large dif- 
ferences (as can be seen in Table 3) in mean scores on the covariate across the 
three ethnic/center groups and there was no guarantee that the same regression 
line could be used to "adjust" scores for all three groups. Cautions against 

Insert Table 3 about here 



using analysis of covariance in this type of situation have become almost cliche 
(e.g., Evans & Anastasio, 1968; Lord, 1967). In this study primary interest was 
on the interaction effect, and since assignment within group was approximately 
random, a slightly different linear model was used in an attempt to overcome the 
aforementioned shortcomings of traditional analysis of covariance. The following 
model was used: 



’ -i + B i x ijk + + + e uk 

where x and y scores are expressed as deviations from tlieir respective grand means; 

and represent the within-group slopes and intercepts respectively; x ^jj c 
a covariate; 8^ is the main effect due to time condition; is the interaction 
between group and time condition; and e ^jj c is the error term. 

This model actually uses separate regression lines derived from data across 
treatments within a given ethnic/center group, and in effect, the dependent vari- 
able becomes the deviation of the special section score about the within-group 
regression line. Obviously, any main effects due to ethnic/center group differ- 
ences could not be tested with such a model. In the present study, however, 
there was no interest in testing group differences, and the use of separate with- 
in-group regression lines should have had the effect of merely reducing the error 
term to be used in testing the hypotheses of interest, that is, the main effects 
due to time condition, and especially the interaction effect. It may be 
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of interest to some readers to note that this analysis was performed using 
a standard multiple regression computer program. The results of the analysis, 
for comparative purposes, are presented in Table A along with the results 
of an analysis of variance performed on the same data. 



Insert Table A about here 

Neither of the tests of interest reached significance levels, and the 
proportions of variance accounted for by the time condition and the inter- 
action effects, respectively, were both near zero in the analysis of variance. 
Thus, raising or lowering the time per item appears to have had almost no 
effect in changing score levels and no differential effect among ethnic/center 
groups. The results of this study appear even more conclusive than the re- 
sults of an earlier study (Evans & Reilly, 1972) in the verbal domain, where 
a significant main effect due to time per item was observed and a slight but not 
significantly beneficial effect for Black candidates resulted when the time per 
item was increased. Aside from the possibility that quantitative abilities are 
more resistant to speed effects than verbal abilities, the major reason for this 
discrepancy in results may be the way in which the experimental sections in each 
of the studies were constructed. In the previous study, the items were not clear- 
ly ordered for difficulty and there was no correction for guessing, while the re- 
verse was true for the experimental section in the present investigation. Both 
ordering items in terms of their difficulties and correcting for guessing would 
tend to weaken any effects due to speed. 

Insert Table 5 about here 
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Kuder-Richardson (KR-20) reliabilities were computed for each group under 
each time condition and are presented in Table 5 along with the correlations 
between the special section and another quantitative section (Q) which may be 
regarded as approximating parallel forms reliabilities. 

It should be noted that section Q could not be considered truly parallel 
since Q differs in number of items (55 for Q vs. 25 for the special section) 
and was administered under invariant time conditions. However, the special 
section was constructed in such a way as to approximate the item character- 
istics of the regular quantitative section of the ATGSB. The distributions 
of item difficulties and item total correlations were approximately the same. 
The "parallel-; form" reliabilities reported in the lower half of Table 5 were 
corrected to account for the differing number of items by use of the formula 
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N a 

r 

xy N • a 
J y x 



where 
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xy 



N 

y 



a 

x 



is the correlation between the two forms 
is the number of items in the shorter of the forms 
is the number of items in the longer of the forms 
and a the standard deviations of the forms. 

y 



As can be seen in Table 5 the K-R coefficient tends to underestimate the par- 
allel forms reliability under less speeded conditions (Form A) and to over- 
estimate this coefficient under more speeded conditions (Form C) . This pat- 
tern is rather interesting but the differences are not in general very large, 
especially in the two regular center groups. However, in the SCB group it may 
be noted that the single form reliability (K-R) overestimates the "parallel 
forms" reliability by a large amount for all three forms possibly reflecting 
the presence of a large speed component for all three SCB groups. 
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Speed may have had the effects of increasing the proportion of error 
variance for the SCB group, thereby spuriously increasing the estimate of 
reliability. As mentioned earlier the dropout rates shown In Figures la,b, 
and c probably do not reflect accurately the effects of speed in a test which 
is both ordered for difficulty and corrected for chance success. 

It is also worth noting that even though in two of the groups (SCB and RCB) 
a substantial portion of the special sections score range was near or below the 
chance level, the scores remained fairly reliable. The reliability and predictive 
value of scores in the chance range has been reported in the literature (Boldt, 
1968) and is of interest, since in order to obtain reliable less-than-chance 
scores individuals have to be operating under a decidedly less than optimal 
guessing strategy. 

One of the criticisms leveled at standardized tests by minority group spokes- 
men has been that performance on such tests is to some degree dependent on a 
"test-wiseness" not possessed by minority group members. The ability to handle 
guessing instructions appropriately and adopt optimum or near optimum guessing 
strategies is certainly related to test-wiseness, and it may be that the relative 
unfamiliarity of some minority group members with guessing instruction tends to 
put them at a slight disadvantage. It should be mentioned, however, that most 
national academic testing programs do not, in fact, report negative scores as such 
but rather scale below-chance scores upward to the lowest positive standardized 
score. Individuals receiving negative scores on the AT.GSB, for example, would in 
practice have a reported score of 200. Research should probably be done, 
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nevertheless, to answer some of the questions surrounding test- talking strat- 
egies as they relate to minority groups. 



Conclusions 



The ATGSB quantitative section appeared to "be a moderately speeded meas- 
ure for both majority and minority group candidates, but neither increasing nor 
decreasing the time per item appeared to result in any differential effect among 
the three ethnic/center groups included in the study, suggesting that such a 
procedure would not eliminate any bias possibly due to speededness. 
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Footnotes 



1 , 



The authors wish to thank The Graduate Business Admissions Council 



which supported this research. 

^More detailed item data are presented in the Appendix. 

3 

Unless otherwise noted the dependent variable or ’'score" referred to will 
be the corrected score on the 25 items common to Forms A, B, and C; i.e., the 
sum of the rights minus 1/4 the sum of the wrongs. All scores discussed are 

corrected scores. 

4 

Random sampling procedures were used to create an orthogonal design by 

sampling down to the smallest cell size. 

5 2 

The R values are the squared multiple correlations with the dummy variables 
representing each factor indicated removed. 



ERIC 




14 









O 

ERIC 



a 



a> 

H 

•8 

EH 



U 

•N 

PQ 

< 



O 

fa 



H 

■8 

EH 

U 

<8 



a> 

•P 

H 

o 

u 

$ 

u 

o 



- 13 - 





LA 


o 


LA 




OJ 


rA 


rA 




cvi 


& 


r^ 






CO 


rA 






OJ 


rA 








OJ 








rA 




fA 

OJ 


& 


H 

rA 




OJ 


vo 


O 




OJ 


OJ 


rA 








G\ 








OJ 




H 


LA 


co 




OJ 


OJ 


OJ 




o 

OJ 


OJ 


& 




o> 


rA 


vo 




H 


OJ 


OJ 




$ 


OJ 

OJ 


LA 

OJ 






H 


-4 






OJ 


OJ 






o 


rA 




H 


OJ 


OJ 




VO 


CTn 


OJ 




H 


H 


OJ 


1 


LT\ 


on 


H 


Q) 


H 


H 


OJ 


+3 








H 




t-- 


o 






H 


OJ 


<P 








0 


-4; 


VD 


o> 




H 


H 


H 


h 








Q) 






00 


t. 






H 


H 

o 


tO 


LA 


fr- 




H 


H 


H 






-4 


VD 






H 


H 




OJ 


rA 


LA 




H 


H 


H 




H 


OJ 


.4 




H 


H 


H 




O 


r| 


rA 




H 


H 


H 




ON 


O 


OJ 




H 


H 




CO 


Os 


Pi 




[>- 


00 


q 








H 








ov 




vo 


D'- 


00 




LA 


Ve 


t- 






LA 


vo 




-4 


.4 


LA 




rA 


rA 


-4 








rA 




OJ 


OJ 


OJ 




H 


H 


H 




N 

59 








S 

a) 


1 


l 




p 


•P 


•P 




•a; H 


pq H 


U H 




a la 


a O 


a la 




R oj 


R rA 


R rA 




0 W 


O W 


o W 




fa 


fa 





a) 



w 



w 



15 



-14- 



Tahle 2 

Descriptive Statistics for Groups in the Study Sample 
(25 Item Corrected Scores) 







Form A 


Form B 


Form C 


RCB 


N 

Mean 


89 

6.59 


96 

7.42 


70 

6.16 




S.D. 


4.90 


4.97 


4.79 




Range 


-3.75 to 18.75 


- 5.0 to 22.5 


- 5.0 to 18.75 




i < 0 a 


21.2 


15.6 


35.7 


RCW 


N 


432 


435 


464 




Mean 


13.79 


13.08 


12.30 




S.D. 


5.26 


5.09 


7-70 




Range 


- 2.5 to 25 


- 6.25 to 25 


-6.25 to 25 




$ < 0 


1.8 


1.6 


2.5 


SCB 


N 


245 


249 


239 




Mean 


4.21 


4.04 


3.46 




S.D. 


3.79 


3.47 


3.45 




Range 


-3.25 to I 7.5 


- 1.75 to 16.75 


- 4.75 to 13.75 




i»< 0 


13.5 


14.1 


19.7 



a $ < 0 indicates the percentage of candidates whose corrected 
scores were zero or less . 
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Table 5 

Mean Scores for Section Q, and Correlations 
between Section Q and Special Section (S) 











Time 


Condition 












Unspeeded 






Normal 






Speeded 




Group 


N 


\ 


r qs 


N 


h 


r qs 


N 


\ 


r 

qs 


RCB 


89 


12.76 


.82 


96 


14.01 


•79 


70 


15.48 


.84 


RCW 


452 


26.85 


.81 


455 


27.09 


•79 


464 


26.66 


•79 


SCB 


245 


6.71 


.65 


2^9 


6.49 


.70 


259 


6.01 


.65 
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Table 4 

Summary of Analysis of Covariance and Analysis of Variance 



Factor 


Covariance Analysis 




R 2 


d.f } 


F 




Ethnic/Center Group 


•7758 


2/627 


5.51* 




Test Form 


•7771 


2/627 


1.69 




Group X Form 


.7769 


4/627 


.64 




All Variables 


.7783 










Analysis of Variance 


Proportion 
of Variance 

2 5 

Accounted for (R y 


Ethnic/Center Group 


.2670 


2/650 


66 . 65 * 


•39 


Test Form 


.3921 


2/650 


1.6l 


.01 


Group X Form 


.5948 


4/650 


.10 


.00 


All Variables 


• 3952 









*p < .05 
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Table 5 

Ruder-Richardson and "Parallel Forms" Reliabilities 
for Ethnic/Center Group and Test Form 



Group 


Ruder- 


-Richardson Coefficients 


Form A 


Form B 


Form C 


RCB 


•79 


.81 


.83 


RCW 


•79 


.80 


.81 


SCB 


CV1 

00 

• 


.82 


00 

• 




"Parallel Forms" 


Reliabilities 


RCB 


•70 


•7^ 


O 

00 

• 


RCW 


•71 


.6 9 


.76 


SCB 


• 55 


.67 


.51 
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Figure la. Percent Items Reached for White Regular Center (WRC) 




Figure lb. Percent Items Reached for Black Regular Center (BRC) 




Figure Ic. Percent Items Reached for Black Special Center (BSC) 
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Appendix A 



Items Deltas and Percentages of Attempts Not Reached and 
Correct Responses for Items Common to Forms A, B, and C 
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* % Correct is based on number attempting each item. 

**Delta is the normal deviate, expressed in terms of a scale with mean of 13 and standard deviation of 4, which 
corresponds to the proportion of candidates attempting the item who answer it correctly. 

***Not reached is the percent of people who did not attempt each item and did not attempt any subsequent items. 
For the last item it is simply the percent of people who did not respond to the item. 
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* % Correct is based on number attempting each item. 

**Delta is the normal deviate, expressed in terms of a scale with mean of 13 and standard deviation of 4, which 
corresponds to the proportion of candidates attempting the item who answer it correctly. 

**'*}jot reached is the percent of people who did not attempt each item and did not attempt any subsequent items. 
For the last item it is simply the percent of people who did not respond to the item. 



